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Abstract — The  more  complex  the  problem,  the  more  com¬ 
plex  the  system  necessary  for  solving  this  problem.  For 
very  complex  problems,  it  is  no  longer  possible  to  design 
the  corresponding  system  on  a  single  resolution  level,  it  be¬ 
comes  necessary  to  have  multiresolutional  systems.  When 
analyzing  such  systems  —  e.g.,  when  estimating  their  per¬ 
formance  and/or  their  intelligence  —  it  is  reasonable  to  use 
the  multiresolutional  character  of  these  systems:  first,  we 
analyze  the  system  on  the  low-resolution  level,  and  then  we 
sharpen  the  results  of  the  low-resolution  analysis  by  con¬ 
sidering  higher-resolution  representations  of  the  analyzed 
system.  The  analysis  of  the  low-resolution  level  provides  us 
with  an  approximate  value  of  the  desired  performance  char¬ 
acteristic.  In  order  to  make  a  definite  conclusion,  we  need 
to  know  the  accuracy  of  this  approximation.  In  this  paper, 
we  describe  interval  mathematics  —  a  methodology  for  es¬ 
timating  such  accuracy.  The  resulting  interval  approach  is 
also  extremely  important  for  tessellating  the  space  of  search 
when  searching  for  optimal  control.  We  overview  the  corre¬ 
sponding  theoretical  results,  and  present  several  case  stud¬ 
ies. 

I.  Multiresolutional  Methods  are  Necessary:  A 
Brief  Reminder 

The  more  complex  the  problem,  the  more  complex  the 
system  necessary  for  solving  this  problem.  For  very  com¬ 
plex  problems,  it  is  no  longer  possible  to  design  the  cor¬ 
responding  system  on  a  single  resolution  level,  it  becomes 
necessary  to  have  multiresolutional  systems. 

The  methodology  of  multiresolutional  search  for  the  op¬ 
timum  solution  of  a  control  problem  was  first  presented  by 
A.  Meystel  in  [40],  [41].  These  papers  contributed  to  the 
broad  interest  in  and  dissemination  of  the  multiresolutional 
approach  to  solving  problems  of  the  areas  of  intelligent  con¬ 
trol  and  intelligent  systems. 

Many  algorithms  based  on  this  methodology  were  de¬ 
veloped  since  then.  The  successful  practical  applications 
of  these  algorithms  shows  that  multiresolutional  approach 
are  indeed  necessary. 

This  empirical  conclusion  has  been  supported  by  many 
mathematical  results;  let  us  name  a  few  recent  ones: 

.  It  has  been  proven  that  for  general  complex  (NP-hard) 
problems,  i.e.,  problems,  for  which  no  general  feasible  algo¬ 
rithm  is  possible,  there  always  exists  an  appropriate  “gran¬ 
ulation”  after  which  the  problem  becomes  easy  to  solve. 
The  fact  that  the  problem  is  NP-hard  means  that  there  is 
no  general  algorithm  for  automatically  finding  such  a  gran¬ 
ulation,  this  granulation  requires  an  expert  familiar  with 
the  particular  problem  that  we  are  trying  to  solve  [11]. 


.  For  noisy  images  I(x )  in  which  we  do  not  know  the  ex¬ 
act  statistical  characteristics  of  the  noise,  only  the  upper 
bound  on  the  noise,  the  optimal  image  processing  requires 
representing  this  image  as  a  linear  combination  of  so-called 
Haar  wavelets  ei(x ),  i.e.,  functions  which  only  take  values 
1  or  0.  Such  a  wavelet  representation  is  a  known  particular 
case  of  a  multiresolutional  representation  [5] ,  [6] . 

.  In  particular,  when  detecting  a  known  pattern  in  a  given 
image,  it  is  provably  better  to  use  lower-resolution  type 
techniques  that  look  for  the  whole  pattern  as  opposed  to 
higher-resolution  techniques  which  look  for  pieces  of  this 
pattern  and  then  try  to  match  found  pieces  together  [64]. 

.  Similarly  to  noisy  images,  for  signal  multiplexing  under 
noise,  the  use  of  Walsh  functions  (similar  to  Haar  wavelets) 
can  be  proven  to  be  the  optimal  choice  [2]. 

.  In  general,  in  function  interpolation,  clustering  tech¬ 
niques  -  in  which  we  combine  the  values  into  clusters  before 
extrapolation  -  turn  out  to  be  optimal  [34].  Such  an  inter¬ 
polation  is  very  useful  in  intelligent  control,  when  we  train 
a  system  by  providing  it  with  examples  of  control  values 
used  by  expert  human  controllers  in  different  situations. 

.  In  general,  in  intelligent  control,  hierarchical  fuzzy  con¬ 
trol  is  better  in  the  sense  that  it  requires  fewer  rules  to 
describe  the  same  quality  control  [35],  [36],  [77]. 

•  Finally,  it  can  be  shown  that  for  many  systems,  the  opti¬ 
mal  control  is  of  “bang-bang”  type,  when  there  are  finitely 
many  preferred  control  values  (or  preferred  fixed  control 
trajectories),  and  the  optimal  control  consists  of  optimally 
switching  between  these  values  (trajectories) .  This  general 
result  explains  different  empirical  phenomena  ranging  from 
the  empirical  fact  of  discrete  speed  levels  in  traffic  control 
to  the  phenomenon  of  sleep  when  it  seems  to  be  biologi¬ 
cally  optimal  to  always  switch  between  several  fixed  levels 
of  activity  [29]. 

II.  Interval  Mathematics:  A  Methodology  for 
Validated  Analysis  of  Multiresolutional 
Systems 

A.  Validated  Analysis  of  Multiresolutional  Systems  Natu¬ 
rally  Leads  to  Interval  Computations 

When  analyzing  multiresolutional  systems  -  e.g.,  when 
estimating  their  performance  and/or  their  intelligence  -  it 
is  reasonable  to  use  the  multiresolutional  character  of  these 
systems:  first,  we  analyze  the  system  on  the  low- resolution 
level,  and  then  we  sharpen  the  results  of  the  low-resolution 
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analysis  by  considering  higher-resolution  representations  of 
the  analyzed  system. 

For  example,  instead  of  the  original  image  with  its  nu¬ 
merous  pixel- by-pixel  brightness  values,  we  consider  a  low- 
resolution  image  in  which  there  is  a  small  finite  number  of 
zones,  and  each  zone  is  characterized  by  a  single  brightness 
value.  After  analyzing  this  image,  we  increase  resolution, 
thus  adding  more  details  (more  zones),  etc. 

The  analysis  of  the  low-resolution  level  provides  us  with 
an  approximate  value  of  the  desired  performance  charac¬ 
teristic.  In  order  to  make  a  definite  conclusion,  we  need 
to  know  the  accuracy  of  this  approximation.  How  can  we 
estimate  this  accuracy? 

In  order  to  solve  this  problem,  let  us  reformulate  it  in 
general  mathematical  terms.  Instead  of  considering  the 
exact  system,  we  consider  its  approximation,  analyze  this 
approximation,  and  then  we  want  to  make  a  conclusion 
about  the  original  system  based  on  this  analysis.  The  orig¬ 
inal  system  is  characterized  by  the  values  of  different  pa¬ 
rameters  xi, . . . ,  xn\  e.g.,  for  the  image,  these  parameters 
are  the  brightness  values  at  different  pixels.  We  want  to  es¬ 
timate  some  characteristic  q  =  f(x i , . . . ,  xn )  of  the  original 
system. 

A  low-resolution  approximation  can  be  usually  described 
by  fewer  parameters  yi, . . . ,  ym,  m  Cn;  e.g.,  for  the  image, 
these  parameters  are  the  brightnesses  of  different  zones. 
Each  parameter  Xi  is  approximated  by  one  of  the  new  pa¬ 
rameters  yj ;  let  us  denote  the  corresponding  parameter  by 
yj(i).  When  each  Xi  is  exactly  equal  to  the  corresponding 
value  yj,  we  get  a  simplified  expression  for  q  which  only 
depends  on  m  <C  n  values:  q  =  f(yi, . . . ,  j/„).  In  real¬ 
ity,  the  values  Xi  are  somewhat  different  from  yj,  and  as 
a  result,  the  estimate  q  is  different  from  the  actual  value 
q  of  the  desired  characteristic.  How  can  we  estimate  the 
corresponding  approximation  error  q  —  ql 

In  addition  to  the  approximate  model  itself,  we  usually 
know,  for  each  j,  the  upper  bound  on  the  error  with  which 
the  value  yj  approximates  the  corresponding  values  Xi.  In 
other  words,  we  know  that  the  actual  value  of  Xi  belongs 
to  the  interval  yj  =  [yj  —  Aj,yj  +  Aj].  Since  each  value  x\ 
belongs  to  the  interval  y^,  the  actual  value  of  the  desired 
characteristic  belongs  to  the  range 

q  =  /(yi(i)j-”>yi(n))  d=  {f(xi,...,xn)\xi  e  yj{i)} 

of  the  function  /  on  these  intervals.  Thus,  in  order  to 
estimate  the  accuracy  of  the  lower-resolution  estimate  q, 
we  can  estimate  the  above  range. 

The  problem  of  estimating  the  range  of  the  function 
f(x i,,..,xn)  when  we  know  the  intervals  x;  of  possible 
values  of  Xi  is  a  known  problem  in  areas  where  the  inputs 
are  not  known  precisely,  be  it  numerical  methods  or  data 
processing.  This  problem  is  called  the  problem  of  interval 
computations,  and  methods  for  solving  this  problem  are 
called  interval  mathematics  [1],  [16],  [17],  [19],  [20],  [44], 
[75]. 


D.  Interval  Computations  are  Difficult 

In  general,  the  interval  computation  problem  is  NP-hard 
even  for  quadratic  functions  f(x i,...,x„);  see,  e.g.,  [26]. 
In  plain  English,  this  means  that  it  is  highly  improvable 
that  we  will  be  able  to  find  a  general  feasible  algorithm 
that  computes  the  exact  range  for  all  functions  /  and  all 
intervals  x,;  in  reasonable  time.  Since  we  cannot  compute 
the  exact  range,  what  can  we  do  instead? 

We  wanted  to  compute  the  exact  range  q  because  we 
wanted  to  get  an  interval  that  is  guaranteed  to  contain  the 
desired  value  q,  and  the  range  definitely  contains  this  value. 
If  we  cannot  compute  the  exact  range  in  reasonable  time, 
we  can  compute  the  approximate  interval  Q  for  the  range. 
The  only  way  to  guarantee  that  the  new  interval  still  con¬ 
tains  q  is  to  make  sure  that  this  new  intervals  contains  the 
entire  range  q  C  Q,  i.e.,  that  this  interval  is  an  enclosure 
for  the  desired  range. 

In  these  terms,  interval  mathematics  is  an  art  of  comput¬ 
ing  good  narrow  enclosures  for  the  range  of  a  given  function 
f(x i , . . . ,  xn)  on  given  intervals  xi , . . . ,  x„. 

C.  Methods  of  Interval  Mathematics:  A  Very  Brief  Intro¬ 
duction 

Interval  mathematics  started,  in  the  1950s,  with  the  ob¬ 
servation  that  for  simple  arithmetic  operations  f(x i,  £2)  = 
X1+X2,  x\  —X2,  etc.,  the  range  can  be  computed  explicitly; 
e.g.: 
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The  corresponding  expressions  are  called  formulas  of  inter¬ 
val  arithmetic. 

It  turns  out  that  we  can  use  these  expressions  to  get  rea¬ 
sonable  enclosures  for  arbitrary  functions  /.  Indeed,  when 
the  computer  computes  the  function  /,  it  parses  the  func¬ 
tion,  i.e.,  it  represents  the  computation  as  a  sequence  of 
elementary  arithmetic  operations.  It  can  proven,  by  in¬ 
duction,  that  if  we  start  with  intervals  and  replace  each 
arithmetic  operation  with  the  corresponding  operation  of 
interval  arithmetic,  at  the  end,  we  get  an  enclosure  for  /. 
For  example,  if  f(x)  =  x  ■  (1  —  x),  represent  /  as  a  sequence 
of  two  elementary  operations: 

•  r  :=  1  —  x  (r  denotes  the  1st  intermediate  result); 

•  y  :=  x-r. 

In  the  interval  version,  perform  the  following  computations: 

•  r  :=  1  —  x; 

.  y  :=  x  •  r. 

In  particular,  when  x  =  [0, 1],  compute  the  intervals  r  := 
[1,1] -[0,1]  =  [0,1],  and 

y  :=  [0, 1]  •  [0, 1]  =  [min(0  •  0, 0  •  1, 1  •  0, 1  •  1), 

max(0-0,0- 1,1  -0,1  •  1)]  =  [0,1]. 

The  interval  [0, 1]  is  indeed  an  enclosure  of  the  actual  range 
[0,0.25]. 


D.  Modern  Methods  of  Interval  Mathematics  and  Their 
Potential  Use  in  Tessellating  the  Search  Space 

D.l  Methods  Based  on  Mean  Value  Theorem 

The  enclosure  obtained  by  using  the  above  simple  idea 
is  often  too  wide.  One  of  the  main  objectives  of  interval 
computations  is  to  make  this  enclosure  narrower.  One  way 
to  do  that  is  to  use  the  mean  value  theorem,  according 
to  which  f(x )  =  f(x o)  +  /'(£)  •  (x  —  Xo)  for  some  value  £ 
between  xo  and  x.  Thus,  if  we  take,  as  xo,  the  midpoint 
of  the  interval  x  of  width  w.  we  will  have  \x  —  xq\  <  w/ 2, 
/'(£)  €  /'(x),  and  thus,  /(x)  C  f(x0)  +  /'(x)  •  [~w/2,  w/2\. 
If  we  do  not  know  the  exact  range  /'(x),  we  can  use  the 
enclosure  for  this  range.  Similar  formulas  can  be  easily 
written  for  the  case  of  several  variables. 

D.2  Methods  Based  on  Division  into  Subboxes  and  Their 
Relation  with  Multiresolutional  Approach 

In  many  cases,  the  above  idea  leads  to  a  reasonable  en¬ 
closure.  If  the  enclosure  is  still  too  wide,  we  can  divide  the 
original  box  xi  x  . . .  x  x„  into  sub-boxes,  compute  the  en¬ 
closure  for  each  of  these  subboxes,  and  then  take  the  union 
of  the  resulting  enclosures. 

It  is  worth  mentioning  that  this  idea  is  completely 
in  line  with  the  general  multiresolutional  approach:  in¬ 
stead  of  considering  the  individual  values  of  the  function 
f(x  i, . . . ,  xn)  for  all  possible  inputs  x\, . ..,  xn,  we  divide 
the  range  of  this  function  into  a  small  number  of  zones,  and 
consider  the  enclosure  for  each  zone.  In  multiresolutional 
terms,  we  are  thus  considering  a  low-resolution  approxima¬ 
tion  to  the  original  function.  If  we  want  better  results,  we 
have  to  consider  smaller  zones,  i.e.,  we  have  to  consider 
higher-resolution  approximations. 

In  other  words,  not  only  the  formulation  of  the  main 
problem  of  interval  mathematics  naturally  comes  from  mul¬ 
tiresolutional  approach,  but  also  the  methods  of  interval 
mathematics  are  completely  in  line  with  this  approach. 

D.3  Interval  Mathematics  as  a  Method  for  Tessellating 
Search  Space 

The  resulting  interval  approach  is  also  extremely  impor¬ 
tant  for  tessellating  the  space  of  search  when  searching  for 
optimal  control  [19],  [20].  The  simplest  way  of  using  inter¬ 
val  computations  in  to  locate  a  maximum  of  the  objective 
function  f(x)  is  as  follows: 

First,  we  compute  the  values  of  f(x)  in  several  points 
x^\  . . . ,  x^;  we  then  now  that  max/(x)  >  M  d= 
ma x(/(xW)).  Then,  we  divide  the  original  range  into  sev¬ 
eral  zones  Zf,  use  interval  computations  to  get  an  enclosure 
Fj  =  [Fff .  ]  of  the  range  of  f(x)  on  each  zone  Zt,  and 
dismiss  the  zones  for  which  F+  <  M  -  because  they  cannot 
contain  the  global  maxima. 

Then,  we  subdivide  the  remaining  zones  into  sub-zones, 
and  repeat  this  procedure  again  -  until  we  locate  the  global 
maxima.  This  idea  leads  to  a  reasonably  efficient  algo¬ 
rithms  for  global  optimization,  with  can  be  further  en¬ 
hanced  by  using  interval  versions  of  gradient-based  opti¬ 
mization  methods. 


Numerous  similar  methods  exist  for  computing  enclo¬ 
sures  and  optimization.  Most  of  these  methods  are  imple¬ 
mented  in  easily  available  software  packages;  see,  e.g.,  [19], 
[20],  [75], 

D.4  Conclusion:  Interval  Mathematics  Is  Very  Useful  for 
Multiresolutional  Approach 

Based  on  the  above,  we  can  conclude  that  interval  math¬ 
ematics  is  a  good  candidate  for  being  “the  ”  mathematics  of 
multiresolutional  systems. 

D. 5  We  Will  Present  Examples  of  Applying  Interval  Com¬ 

putations 

In  the  following  sections,  we  will  describe  two  applica¬ 
tions  of  interval  mathematics  in  some  detail.  Before  we  go 
into  the  descriptions,  we  should  mention  that  the  above  is 
the  description  of  a  “vanilla”  situation.  In  many  real-life 
cases,  the  situation  is  even  more  complex,  because,  in  addi¬ 
tion  to  a  quantitative  conclusion  (about  the  value  of  some 
quantity  q ),  we  need  to  make  a  qualitative  conclusion:  e.g., 
in  the  following  example,  a  conclusion  on  whether  a  plate 
has  a  hidden  fault  or  not. 

E.  Case  Study:  Non-Destructive  Testing 

This  case  study  is  described,  in  detail,  in  [65],  [72],  [73], 
[74]. 

In  many  areas,  e.g.,  in  aerospace  industry,  in  medicine, 
it  is  desirable  to  detect  mechanical  faults  without  damag¬ 
ing  or  reassembling  the  original  system.  For  testing,  we 
send  a  signal  and  measure  the  resulting  signal.  The  input 
signal  can  be  described  by  its  intensity  n , ...  ,rn  at  dif¬ 
ferent  moments  of  time.  The  intensities  s±,, . .  ,sm  of  the 
resulting  signal  depend  on  rf.  Sj  =  fj(r\ ....  where 
the  functions  fj  depend  on  the  tested  structure. 

Usually,  we  do  not  know  the  exact  analytical  expression 
for  the  dependency  fj ,  so  we  can  use  the  fact  that  an  arbi¬ 
trary  continuous  function  can  be  approximated  by  a  poly¬ 
nomial  (of  a  sufficiently  large  order).  Thus,  we  can  take 
a  structure,  try  a  general  linear  dependency  first,  then,  if 
necessary,  general  quadratic,  etc.,  until  we  find  the  depen¬ 
dency  that  fits  the  desired  data. 

If  a  structure  has  no  faults,  then  the  surface  is  usually 
smooth.  As  a  result,  the  dependency  fj  is  also  smooth; 
we  can  expand  it  in  Taylor  series.  Since  we  are  sending 
relatively  weak  signals  (strong  signals  can  damage  the 
plane),  we  can  neglect  quadratic  terms  and  only  consider 
linear  terms  in  these  series;  thus,  the  dependency  will  be 
linear. 

A  fault  is,  usually,  a  violation  of  smoothness  (e.g.,  a 
crack).  Thus,  if  there  is  a  fault,  the  structure  stops  be¬ 
ing  smooth;  hence,  the  function  fj  stops  being  smooth, 
and  therefore,  linear  terms  are  no  longer  sufficient.  Thus, 
in  the  absence  of  fault,  the  dependence  is  linear,  but  with 
the  faults,  the  dependence  is  non-linear.  So,  we  can  detect 
the  fault  by  checking  whether  the  dependency  between  Sj 
and  ri  is  linear.  So,  we  send  several  different  inputs,  mea¬ 
sure  the  values  r\k^  and  s'-''1  corresponding  to  these  inputs, 
and  check  whether  the  dependence  is  linear.  In  this  case, 


the  values  rp  and  sp  are  the  inputs  x±, . . .  ,xn,  but  the 
desired  q  is  a  qualitative  (yes-no)  variable:  we  simply  want 
to  know  whether  there  is  a  fault  or  not.  If  there  is  a  fault, 
then  we  would  also  like  to  make  a  quantitative  conclusion 
of  its  size,  location,  etc.,  but  the  most  important  part  of 
the  analysis  is  to  check  whether  there  is  any  fault  at  all. 

If  the  measurements  were  ideal,  all  we  had  to  do  was  to 
check  whether  there  are  values  for  which,  for  all  j  and 
for  all  measurements  k,  we  have: 

ajo  +  aj i  •  r[k)  +  . . .  +  ajn  ■  r^  =  s (k) . 

Solvability  of  a  system  of  linear  equations  is  easy  to  check. 

In  reality,  the  situation  is  more  complicated.  Measure¬ 
ment  are  usually  imprecise:  the  result  x  of  measuring  the 
actual  value  x  is  somewhat  different  from  the  actual  value 
x.  In  many  real-life  situations,  we  do  not  know  the  proba¬ 
bilities  of  different  values  of  measurement  error  Ax  =  x  — x, 
we  only  know  the  upper  bound  A  of  the  corresponding  mea¬ 
surement  error.  As  a  result,  the  only  information  that  we 
have  about  the  actual  value  x  of  the  measured  quantity 
is  that  it  belongs  to  the  interval  x  =  [x  —  A,x  +  A].  So, 
in  practice,  instead  of  the  exact  values  of  rp  and  sp , 

we  have  intervals  rp  and  sp  of  possible  values  of  these 
quantities.  The  question  becomes:  are  these  intervals  con¬ 
sistent  with  the  linearity,  i.e.,  are  there  values  rp  €  rp 
and  sp  £  sp  for  which,  for  some  values  aji ,  the  above 
linearity  formulas  hold. 

In  general,  the  solvability  of  the  corresponding  system  of 
interval  linear  equations  is  an  NP-hard  problem  [26],  but 
for  some  cases,  efficient  algorithms  have  been  developed. 
For  example,  when  we  have  only  one  (non-negative)  in¬ 
put  and  only  one  output,  with  non- intersecting  intervals 
r(!)  <  r(2)  <  . .  . .  the  solvability  of  the  corresponding  sys¬ 
tem  of  linear  equations  can  be  proven  to  be  equivalent  to 
the  following  inequality: 

s(i)—  _  s(k)+  ^(0+  —  gW— 

max  7- - -tt—  <  min  — - rrrr- 

k<i  r"l+  —  r(k>~  ~  k<i  —  rw+ 

We  tested  this  method  on  the  dependence  of  the  energy  E 
of  the  ultrasound  response  on  the  voltage  V  that  causes 
the  original  ultrasound  signal.  The  results  show  that  non¬ 
linearity  is  indeed  an  indication  of  a  fault: 

•  For  faultless  plates,  the  above  inequality  is  indeed  true, 
meaning  that  the  measurement  results  are  consistent  with 
linearity. 

•  For  plates  with  faults,  this  inequality  is  not  satisfied, 
meaning  that  the  dependence  is  non-linear. 

F.  Case  Study:  Reliable  Sub-Division  of  Geological  Areas 

This  case  study  is  described,  in  detail,  in  [7],  [8]. 

In  geophysics,  appropriate  subdivision  of  an  area  into 
segments  is  extremely  important,  because  it  enables  us  to 
extrapolate  the  results  obtained  in  some  locations  within 
the  segment  (where  extensive  research  was  done)  to  other 
locations  within  the  same  segment,  and  thus,  get  a  good 
understanding  of  the  locations  which  weren’t  that  thor¬ 
oughly  analyzed.  The  subdivision  of  a  geological  zone  into 


segments  is  often  a  controversial  issue,  with  different  evi¬ 
dence  and  different  experts’  intuition  supporting  different 
subdivisions. 

For  example,  in  our  area  -  Rio  Grande  rift  zone  -  there 
is  some  geochemical  evidence  that  this  zone  is  divided  into 
three  segments  [39]: 

•  the  southern  segment  which  is  located,  approximately, 
between  the  latitudes  y  =  29°  and  y  =  34°; 

•  the  central  segment  -  from  y  =  34.5°  to  y  =  38°;  and 

•  the  northern  segment  -  from  y  =  38°  to  y  —  41°. 
However,  in  the  viewpoint  of  many  researchers,  this  evi¬ 
dence  is  not  yet  sufficiently  convincing. 

It  is  therefore  desirable  to  develop  new  techniques  for 
zone  sub-division,  techniques  which  would  be  in  the  least 
possible  way  dependent  on  the  (subjective)  expert  opin¬ 
ion  and  would,  thus,  be  maximally  reliable.  To  make  this 
conclusion  more  reliable,  we  use,  instead  of  the  more  rare 
geological  samples,  a  more  abundant  topographical  informa¬ 
tion  (this  information,  e.g.,  comes  from  satellite  photos). 
We  can  characterize  each  part  of  the  divided  zone  by  its 
topography. 

In  topographical  analysis,  we  face  a  new  problem:  of 
too  much  data,  most  of  which  is  geophysically  irrelevant. 
To  eliminate  some  of  this  irrelevant  data,  we  can  use  the 
Fourier  transform;  indeed,  it  is  known  that  while  (at  least 
some)  absolute  values  of  the  map  (forming  a  so-called  spec¬ 
trum)  are  geophysically  meaningful,  the  phases  usually  are 
random  and  can  be  therefore  ignored.  So,  we  should  only 
use  the  spectrum. 

Since  we  are  interested  only  in  the  large-scale  classifica¬ 
tion,  it  makes  sense  to  only  use  the  spectrum  values  corre¬ 
sponding  to  relatively  large  spatial  wavelengths,  i.e.,  wave¬ 
lengths  L  for  which  L  >  L0  for  some  appropriate  value  L0. 
In  particular,  for  the  sub-division  of  the  Rio  Grande  rift,  it 
makes  sense  to  use  only  wavelengths  of  L0  =  1000  km  or 
larger. 

Also,  for  the  Rio  Grande  Rift,  we  are  interested  in  the 
classification  of  horizontal  zones,  so  it  makes  sense  to  di¬ 
vide  the  Rio  Grande  Rift  into  1°  zones  [ y~,y+ ]  (with  y 
from  y~  =  30  to  y+  =  31,  from  y~  =  31  to  y+  =  32,  . . . , 
from  y~  —  40  to  y+  =  41).  For  each  of  these  zones,  we  take 
the  topographic  data,  i.e.,  the  height  h(x,y)  described  as  a 
function  of  longitude  x  and  latitude  y,  compute  the  Fourier 
transform  H(u),y)  with  respect  to  x,  combine  all  the  spec¬ 
tral  values  which  correspond  to  large  wavelength  (i.e.,  for 
which  lj  <  1/To);  and  compute  the  resulting  spectral  value 

/;/+  rl/L0 

/  \H(u,y)\2du;dy. 

v=y~  ^ to—0 

Since  we  are  interested  in  comparing  the  spectral  values 
S(y )  corresponding  to  different  latitudes  y,  so  we  are  not 
interested  in  the  absolute  values  of  S(y),  only  in  relative 
values.  Thus,  to  simplify  the  data,  we  can  normalize  them 
by,  e.g.,  dividing  each  value  S(y~)  by  the  largest  S'max  of 
these  values.  In  particular,  for  the  Rio  Grande  rift,  the 
resulting  values  of  y~  =yi,y2,...  and  Si  =  5(^)/5max  are 
as  follows: 


TABLE  I 


Vi 

29 

30 

31 

32 

33 

34 

Si 

0.28 

0.24 

0.21 

0.16 

0.20 

0.29 

35 

36 

37 

38 

39 

40 

41 

0.31 

0.35 

0.46 

1.00 

0.80 

0.96 

0.74 

Based  only  on  these  spectral  values  s*.  we  will  try  to  classify 
locations  into  several  clusters  (“segments”). 

From  the  geophysical  viewpoint,  the  desired  zones  cor¬ 
respond  to  “monotonicity  regions”:  in  the  first  zone,  the 
values  Si  are  (approximately)  decreasing,  in  the  next  zone, 
they  are  (approximately)  increasing,  etc.  So,  we  must 
look  for  the  monotonicity  regions  of  the  (unknown)  func¬ 
tion  s(y). 

The  problem  is  that  the  values  Sj  are  only  approximately 
known,  so  we  cannot  simply  compare  the  values  to  de¬ 
termine  whether  a  function  increases  or  decreases.  The 
heights  are  measured  pretty  accurately,  so  the  only  er¬ 
rors  in  the  values  Si  come  from  discretization.  In  other 
words,  we  would  like  to  know  the  values  of  the  function 
s(y)  =  5(j/)/5max  for  all  y,  but  we  only  know  the  values 
Si  =  s(yi),  =  s(yn)  of  this  function  for  the  points 

y\. ... .  yn.  For  each  y  which  is  different  from  yi,  it  is  rea¬ 
sonable  to  estimate  s(y)  as  the  value  s,  =  s(j/*)  at  the  point 
2/i  which  is  the  closest  to  y  (and,  ideally,  which  belongs  to 
the  same  segment  as  yf).  For  each  point  y j,  what  is  the 
largest  possible  error  A,  of  the  corresponding  approxima¬ 
tion? 

When  y  >  yi,  the  point  yi  is  still  the  closest  until  we 
reach  the  midpoint  ym id  =  (yi  +  yi+ 1)/2  between  yi  and 
yi+i-  It  is  reasonable  to  assume  that  the  largest  possible 
approximation  error  |s(y)  —  Sj|  for  such  points  is  attained 
when  the  distance  between  y  and  yi  is  the  largest,  i.e.,  when 
y  is  this  midpoint;  in  this  case,  the  approximation  error  is 
equal  to  |s(ymid)  —  s*|. 

If  the  points  yi  and  yi+ 1  belong  to  the  same  segment, 
then  the  dependence  of  s(y)  on  y  should  be  reasonably 
smooth  for  y  €  [j/i,j/i+ 1].  Therefore,  on  a  narrow  in¬ 
terval  \yi,yi+ 1],  we  can,  with  reasonable  accuracy,  ignore 
quadratic  and  higher  terms  in  the  expansion  of  s(yi  +  Ay) 
and  thus,  approximate  s(y)  by  a  linear  function.  For  a 
linear  function  s(y),  the  difference  s(ym jd)  —  s(yi)  is  equal 
to  the  half  of  the  difference  s(yi+ 1)  —  s(yi)  =  s,+i  —  sf, 
thus,  for  y  >  yi,  the  approximation  error  is  bounded  by 
0.5  |  S  j+l  Si  | . 

If  the  points  yi  and  y-i+i  belong  to  different  seg¬ 
ments,  then  the  dependence  s(y)  should  exhibit  some  non¬ 
smoothness,  and  it  is  reasonable  to  expect  that  the  dif¬ 
ference  |si+i  —  Si |  is  much  higher  than  the  approximation 
error. 

In  both  cases,  the  approximation  error  is  bounded  by 
0.5  Si  j . 


Similarly,  for  y  <  yi,  the  approximation  error  is  bounded 
by  0.5  •  |sj  —  |  if  the  points  y*  and  i  belong  to  the 

same  segment,  and  is  much  smaller  if  they  don’t.  In  both 
cases,  the  approximation  error  is  bounded  by 

0.5  •  [Sj  Si  —  1 1. 

We  have  two  bounds  on  the  approximation  error  and  we 
can  therefore  conclude  that  the  approximation  error  cannot 
exceed  the  smallest  Aj  of  these  two  bounds,  i.e.,  the  value 

A i  =  0.5  •  min(|sj  -  sh  |,  |si+i  -  Sj|). 

As  a  result,  instead  of  the  exact  values  s,,  for  each  i,  we  get 
the  interval  s,  =  of  possible  values  of  s(y),  where 

sf  —  Si  —  A*  and  sf  =  Sj  +  Aj.  In  particular,  for  the  Rio 
Grande  rift,  we  get: 

Si  =  [0.26, 0.30],  s2  =  [0.225, 0.255],  s3  =  [0.195,0.225], 
s4  =  [0.14, 0.18],  s5  =  [0.18, 0.22],  s6  =  [0.28,0.30], 
s7  =  [0.30, 0.32],  s8  =  [0.33, 0.37],  s9  =  [0.405,0.515], 
s10  =  [0.80, 1.10],  sn  =  [0.72, 0.88],  si2  =  [0.88, 1.04], 
s13  =  [0.63,0.85]. 

We  want  to  find  regions  of  uncertainty  of  a  function  s(y), 
but  we  do  not  know  the  exact  form  of  this  function;  all  we 
know  is  that  for  every  i,  s(yi )  €  Sj  for  known  intervals  Sj. 
How  can  we  find  the  monotonicity  regions  in  the  situation 
with  such  interval  uncertainty?  Of  course,  since  we  only 
know  the  values  of  the  function  s(y)  in  finitely  many  points 
yi,  this  function  can  have  as  many  monotonicity  regions  be¬ 
tween  yi  and  r/j+i  as  possible.  What  we  are  interested  in 
is  funding  the  subdivision  into  monotonicity  regions  which 
can  be  deduced  from  the  data.  The  first  natural  question  is: 
can  we  explain  the  data  by  assuming  that  the  dependence 
s(y)  is  monotonic?  If  not,  then  we  can  ask  for  the  possibil¬ 
ity  of  having  a  function  s(y)  with  exactly  two  monotonicity 
regions: 

•  if  such  a  function  is  possible,  then  we  are  interested  in 
possible  locations  of  such  regions; 

•  if  such  a  function  is  not  possible,  then  we  will  try  to  find 
a  function  s(y)  which  is  consisted  with  our  interval  data 
and  which  has  three  monotonicity  regions,  etc. 

This  problem  was  first  formalized  and  solved  in  [68],  [69], 
where  we  developed  a  linear-time  algorithm  for  solving  this 
problem.  By  applying  this  algorithm,  we  find  three  mono¬ 
tonicity  regions:  [29,34],  [31,41],  and  [37,41]  -  in  good 
accordance  with  the  geochemical  data  from  [39] . 

G.  Other  Applications:  A  Brief  Overview 

Other  successful  applications  of  interval  techniques  in¬ 
clude: 

•  telemanipulation  [9],  [25],  [65]; 

•  robot  navigation  [65]; 

•  analysis  of  multi-spectral  satellite  images  [63],  [65]. 

Since  a  fuzzy  set  can  be  naturally  represented  as  a  nested 
family  of  intervals  (corresponding  to  different  levels  of  cer¬ 
tainty),  methods  of  fuzzy  data  processing  actively  use  inter¬ 
val  computations  and  be  considered  as  natural  applications 
of  interval  techniques  [22],  [50],  [54],  [65]. 


III.  Multi-D  Generalizations  of  Interval 
Mathematics  and  Symmetry  Approach 

A.  General  Idea 

In  addition  to  the  upper  bound  on  the  approximation  er¬ 
ror  for  each  quantity  Xi,  we  often  have  an  additional  infor¬ 
mation.  For  example,  in  some  cases,  in  addition  to  the  up¬ 
per  bounds  Aj  for  the  differences  Xi  —  &*,  we  also  know  the 
upper  bound  on  their  distance  between  the  vectors  x  and  x, 
i.e.,  the  upper  bound  on  \J (x\  —  x±)2  +  ...-(-  (x„  —  xn)2. 
In  this  case,  we  know  that  the  actual  values  of  x\, ...  ,xn 
belongs  to  the  intersection  of  a  box  xi  x  . . .  x  x„  and  a  ball. 
We  may  have  more  complex  shapes.  Processing  complex 
shapes  is  computationally  difficult  (see,  e.g.,  [32]),  so  we 
must  find  good  approximations  for  such  shapes.  Ideally, 
we  should  find  approximations  which  are  optimal  in  some 
reasonable  sense. 

A  similar  problem  of  finding  the  optimal  shapes  arises 
in  the  selection  of  “clusters”  (zones)  corresponding  to  the 
low-resolution  approximation.  Here  also,  it  is  desirable  to 
find  the  optimal  zones. 

Let  us  show,  on  the  example  of  selecting  zones  on  the 
plane,  how  this  problem  can  be  solved  (a  more  general  case 
is  described  in  [47]). 

Of  course,  the  more  parameters  we  allow,  the  better  the 
approximation.  So,  the  question  can  be  reformulated  as 
follows:  for  a  given  number  of  parameters  (i.e.,  for  a  given 
dimension  of  approximating  family) ,  which  is  the  best  fam¬ 
ily? 

For  simplicity,  we  will  restrict  ourselves  to  families  of 
sets  have  analytical  (or  piece-wise  analytical)  boundaries, 
i.e.,  boundaries  that  can  be  described  by  an  equation 
F(x,y)  =  0  for  some  analytical  function  F(x,y)  =  a  + 
bx  +  cy  +  dx2  +  exy  +  fy2  +  . . .  Since  we  are  interested 
in  finite-dimensional  families  of  sets,  it  is  natural  to  con¬ 
sider  finite-dimensional  families  of  functions,  i.e.,  families 
of  the  type  {C\  ■  Fx  (x,  y)  + . . .  +  Cd  ■  Fd(x,  y)},  where  Ffiz) 
are  given  analytical  functions,  and  Ci, . . .  ,Cd  are  arbitrary 
(real)  constants.  So,  the  question  is:  which  of  such  families 
is  the  best? 

When  we  say  “the  best” ,  we  mean  that  on  the  set  of  all 
such  families,  there  must  be  a  relation  >  describing  which 
family  is  better  or  equal  in  quality.  This  relation  must  be 
transitive  (if  A  is  better  than  B,  and  B  is  better  than  C, 
then  A  is  better  than  C).  This  relation  is  not  necessarily 
asymmetric,  because  we  can  have  two  approximating  fam¬ 
ilies  of  the  same  quality.  However,  we  would  like  to  require 
that  this  relation  be  final  in  the  sense  that  it  should  define 
a  unique  best  family  Aopt  (i.e.,  the  unique  family  for  which 
MB  (Aopt  >  B).  Indeed: 

•  If  none  of  the  families  is  the  best,  then  this  criterion  is 
of  no  use,  so  there  should  be  at  least  one  optimal  family. 

•  If  several  different  families  are  equally  best,  then  we  can 
use  this  ambiguity  to  optimize  something  else:  e.g.,  if  we 
have  two  families  with  the  same  approximating  quality, 
then  we  choose  the  one  which  is  easier  to  compute.  As 
a  result,  the  original  criterion  was  not  final:  we  get  a  new 
criterion  (A  >new  B  if  either  A  gives  a  better  approxima¬ 


tion,  or  if  A  ~0ici  B  and  A  is  easier  to  compute),  for  which 
the  class  of  optimal  families  is  narrower.  We  can  repeat 
this  procedure  until  we  get  a  final  criterion  for  which  there 
is  only  one  optimal  family. 

It  is  reasonable  to  require  that  the  relation  A  >  B  should 
be  invariance  relative  to  natural  geometric  symmetries,  i.e., 
shift-,  rotation-  and  scale-invariant. 

Now,  we  are  ready  for  the  formal  definitions. 

Definition  1.  Let  d  >  0  be  an  integer.  By  a  d- dimensional 
family,  we  mean  a  family  A  of  all  functions  of  the  type 
{Gi  ■  Fi  (x,  y)  +  . . .  +  Cd  ■  Fd(x,  y)},  where  Ffiz)  are  given 
analytical  functions,  and  C\, ...  ,Cd  are  arbitrary  (real) 
constants.  We  say  that  a  set  is  defined  by  this  family 
A  if  its  border  consists  of  pieces  described  by  equations 
F(x,y )  =  0,  with  F  £  A. 

Definition  2.  By  an  optimality  criterion,  we  mean  a  tran¬ 
sitive  relation  >  on  the  set  of  all  d-dimensional  families.  We 
say  that  a  criterion  is  final  if  there  exists  one  and  only  one 
optimal  family,  i.e.,  a  family  Aopt  for  which  MB  (Aopt  >  B). 
We  say  that  a  criterion  >  is  shift-  (corr.,  rotation-  and  scale- 
invariant)  if  for  every  two  families  A  and  B,  A  >  B  implies 
T A  >  TB,  where  TA  is  a  shift  (rotation,  scaling)  of  the 
family  A. 

Theorem  [33],  [71].  (d  <  4)  Let  >  be  a  final  optimality 
criterion  which  is  shift-,  rotation-,  and  scale-invariant,  and 
let  Aopt  be  the  corresponding  optimal  family.  Then,  the 
border  of  every  set  defined  by  this  family  Aopt  consists  of 
straight  line  intervals  and  circular  arcs. 

For  d  =  5  and  d  =  6,  we  also  get  hyperbolas,  parabolas, 
and  ellipses  [55]. 

A  similar  symmetry-based  optimization  technique  can  be 
used  to  find  the  optimal  technique  for  subdividing  boxes  in 
interval  range  estimation  and  interval  optimization;  see, 
e.g.,  [21]. 

B.  Case  Studies:  Brief  Overview 
B.l  Analyzing  Cotton  Images 

The  above  approach  has  been  very  helpful  in  the  auto¬ 
matic  analysis  of  cotton  images  [55],  [61].  Specifically,  the 
above  symmetry-based  approach  helps  in  classifying  trash 
(bark,  leaves,  etc.)  in  ginned  cotton  and  in  classifying  in¬ 
sects  by  their  shapes.  The  symmetry  approach  enables  us 
not  only  to  find  the  optimal  shapes,  but  also  to  find  the  op¬ 
timal  geometric  characteristics  for  distinguishing  between 
different  shapes  and  different  sizes  of  the  same  size.  The 
same  symmetry  approach  leads  to  the  conclusion  that  the 
optimal  approximations  to  sizes  form  a  geometric  progres¬ 
sion;  this  conclusion  is  in  good  accordance  with  the  actual 
insect  sizes. 

B.2  Half-Orders  of  Magnitude 

A  similar  geometric  progression  result  explains  why, 
when  people  make  crude  estimates,  they  feel  comfortable 
choosing  between  alternatives  which  differ  by  a  half-order 
of  magnitude  (e.g.,  were  there  100,  300,  or  1,000  people 
in  the  crowd),  and  less  comfortable  making  a  choice  on  a 


more  detailed  scale,  with  finer  granules,  or  on  a  coarser 
scale  (like  100  or  1,000)  [18].  This  empirical  fact  is  diffi¬ 
cult  to  explain  within  standard  uncertainty  formalisms  like 
fuzzy  logic;  see,  e.g.,  [31]. 

B.3  Analyzing  Geospatial  Data  II 

Computer  processing  can  drastically  improve  the  quality 
of  an  image  and  the  reliability  and  accuracy  of  a  spatial 
database.  A  large  image  (database)  does  not  easily  fit  into 
the  computer  memory,  so  we  process  it  by  downloading 
pieces  of  the  image.  Each  downloading  takes  a  lot  of  time, 
so,  to  speed  up  the  entire  processing,  we  must  use  as  few 
pieces  as  possible. 

Many  algorithms  for  processing  images  and  spatial 
databases  consist  of  comparing  the  value  at  a  certain  spa¬ 
tial  location  with  values  at  nearby  locations.  For  such  algo¬ 
rithms,  we  must  select  (possibly  overlapping)  sub- images  in 
such  a  way  that  for  each  point,  its  neighborhood  (of  given 
radius)  belongs  to  a  single  sub-image.  In  [3],  we  formulate 
the  corresponding  optimization  problem  in  precise  terms, 
and  show  (in  good  accordance  with  the  above  optimization 
result)  that  the  optimal  sub-images  should  be  bounded  by 
straight  lines  or  circular  arcs. 

B.4  Analyzing  Geospatial  Data  III 

Geospatial  databases  often  contain  erroneous  measure¬ 
ments.  For  some  such  databases  such  as  gravity  databases, 
the  known  methods  of  detecting  erroneous  measurements 
-  based  on  regression  analysis  -  do  not  work  well.  As  a 
result,  to  clean  such  databases,  experts  use  manual  meth¬ 
ods  which  are  very  time-consuming.  In  [70],  we  propose  a 
(natural)  multiresolutional  (localized)  version  of  regression 
analysis  as  a  technique  for  automatic  cleaning.  Specifically, 
we  subdivide  the  original  image  into  zones,  and  apply  re¬ 
gression  analysis  separately  within  each  zone  (on  the  high- 
resolution  level)  and  between  different  zones  (on  a  low- 
resolution  level). 

In  this  physical  problem,  natural  requirements  lead  to 
the  following  optimality  criterion  for  selecting  zones:  min¬ 
imizing  the  zone’s  diameter  (that  describes  the  variance 
within  the  zone)  under  given  area  (that  describes  the  num¬ 
ber  of  measurements  within  the  zones).  The  efficiency  of 
the  resulting  optimal  zones  is  shown  on  the  example  of  the 
gravity  database,  where  our  algorithm  not  only  detected  all 
erroneous  measurements  found  manually  by  the  experts;, 
but  it  also  uncovered  several  suspicious  points  that  the  ex¬ 
perts  overlooked. 

B.5  Non-Destructive  Testing  II 

A  standard  way  of  detecting  faults  is  to  measure  a  certain 
quantity  x  at  different  points  on  the  analyzed  plate,  and 
to  classify  the  point  as  faulty  is  when  the  value  x  of  the 
measured  quantity  at  this  point  differs  from  the  average  a 
of  measurement  results  by  more  than  two  or  three  a. 

Based  on  the  results  of  measuring  a  single  quantity  (e.g., 
ultrasonic  signal),  we  often  miss  some  faults.  To  improve 
the  quality  of  fault  detection,  it  is  necessary  to  measure  sev¬ 
eral  different  quantities,  and  combine  the  results  of  these 


measurements.  A  natural  idea  is  to  classify  the  point  as 
faulty  is  one  of  the  measurement  detects  a  fault.  How¬ 
ever,  one  of  the  measurements  may  be  erroneous,  we  would 
rather  consider  a  point  a  fault  location  if  at  least  one  other 
measured  quantity  at  this  or  nearby  point  indicates  a  fault. 

In  other  words,  to  improve  the  quality  of  fault  detection, 
we  replace  the  original  point-by-point  analysis  by  a  new 
method  which  involves  high-resolution  clustering.  When 
the  corresponding  neighborhoods  are  selected  in  an  optimal 
way,  this  replacement  indeed  improves  the  quality  of  fault 
detection  [58],  [59]. 

A  further  improvement  in  fault  detections  comes  when 
we  treat  the  physically  different  points  near  the  plate’s  edge 
as  a  different  zone,  and  classify  a  point  as  faulty  only  if  the 
corresponding  value  x  differs  from  the  average  az  within 
this  zone  by  more  than  two  or  three  standard  deviations 
<7Z  measured  within  this  zone  2.  In  other  words,  a  fur¬ 
ther  improvement  in  fault  detection  comes  when  we  sup¬ 
plement  the  above  high-resolution  technique  by  additional 
low-resolution  subdivision  into  zones. 

B.6  Why  Two  Sigma 

In  the  above  example,  and  in  statistics  in  general,  a  two- 
sigma  criterion  is  used.  The  normal  justification  for  this 
criterion  is  that  for  k  m  2,  the  dependence  of  the  probabil¬ 
ity  to  be  outside  the  k  ■  a  interval  [a  —  k  ■  a,a  +  k  ■  a]  on  the 
(unknown)  probability  distribution  is  the  smallest.  In  [52], 
[53] ,  we  provide  a  theoretical  explanation  for  this  empirical 
fact,  and  thus,  for  the  “2 ct”  criterion. 

For  that,  we  take  into  consideration  the  fact  that  an  arbi¬ 
trary  probability  distribution  can  be  represented  as  f(rj), 
where  rj  is  normally  distributed,  so  the  choice  of  a  dis¬ 
tribution  is  equivalent  to  the  choice  of  a  function  f(x). 
An  symmetry-based  approach  similar  to  the  one  presented 
above  leads  to  the  family  f(x)  =  xa .  and  for  this  family, 
in  the  vicinity  of  normal  distribution  (when  a  «  1),  the 
smallest  dependence  on  a  is  indeed  attained  for  k  k  2. 

B.7  Acupuncture  Points 

The  above  approach  to  describing  optimal  shapes  can 
be  successfully  applied  to  finding  a  good  approximation 
for  the  location  of  the  acupuncture  points,  i.e.,  points  in 
which  acupuncture  treatment  is  the  most  efficient  [46]. 

B.8  Towards  Optimal  Image  Compression 

In  the  above  image  processing  problems,  we  process  the 
image  as  it  appears.  In  many  situations,  we  must  store  the 
image  for  future  use,  and  there  is  not  enough  storage  space 
to  store  all  the  images,  so  we  need  to  compress  the  image. 
In  other  situations,  there  is  not  enough  bandwidth  to  send 
the  entire  image,  so  again,  compression  is  needed. 

It  is  proven  that  finding  the  optimal  compression  of  a 
given  image,  be  it  an  optimal  lossless  compression  or  an 
optimal  lossy  compression  with  a  given  bound  on  allowable 
loss  of  information,  is  a  computationally  difficult  problem 
[66] .  Since  we  cannot  find  the  optimal  compression,  a  nat¬ 
ural  idea  is  to  consider  several  compression  techniques  and 
find  the  best  one.  The  problem  is  to  quantify  what  “the 


best”  means,  especially  in  the  situations  when  we  may  have 
several  possible  applications  of  the  compressed  image,  and 
since  we  do  not  know  where  exactly  this  image  will  be  used, 
it  is  difficult  to  quantify  the  quality  of  the  compression.  In 
[23],  [49],  we  consider  the  optimal  choice  of  quality  met¬ 
ric  most  appropriate  for  a  given  problem.  First,  we  use 
a  similar-based  optimization  approach  to  find  the  optimal 
family  of  possible  quality  metrics  (which  turns  out  to  be 
Lp-metrics),  and  then,  we  find  p  based  on  a  specific  prob¬ 
lem. 

B.9  Pattern  Matching 

In  many  real-life  situations,  we  are  interested  in  finding 
the  known  pattern  in  a  given  image.  For  example,  in  the 
analysis  of  geospatial  data,  we  may  be  looking  for  certain 
geophysical  patterns  indicative  of,  say,  presence  of  water. 
In  [10],  [12],  [13],  [14],  [62],  [78],  a  similar  symmetry-based 
optimality  approach  is  used  to  develop  optimal  FFT-based 
techniques  for  such  matching. 

B.10  Guaranteed  Quality  Estimation  for  Approximately 
Given  Systems 

Our  final  example  bring  us  back  to  the  original  problem 
-  of  quality  estimation  for  an  approximately  given  system. 
Symmetry-based  approach  can  help  in  designing  optimal 
methods  for  such  quality  estimation  for  the  situations  when 
the  system  is  treated  as  a  “black  box” ,  a  low-resolution  ap¬ 
proximation  to  the  original  system  in  which  we  are  not  al¬ 
lowed  to  use  the  high- resolution  details  [24] ,  [67] .  In  partic¬ 
ular,  in  [24],  [67],  we  describe  modified  Monte-Carlo  tech¬ 
niques  which  provide  us  with  validated  results  even  when 
we  do  not  know  the  exact  values  of  the  statistical  charac¬ 
teristics  of  the  system  -  only  intervals  of  possible  values  of 
such  characteristics. 

IV.  Multiresolutional  Approach  to  Reasoning 
and  Logic:  A  Brief  Overview 

A.  Reasoning  and  Logic:  Successes  and  Problems 

Multiresolutional  approach  can  be  applied  not  only  to 
the  systems  themselves,  but  also  to  the  way  we  reason 
about  these  systems,  i.e.,  to  the  logic  of  human  reasoning. 
Specifically,  in  many  areas  (medicine,  geophysics,  military 
decision-making,  etc.),  top  quality  experts  make  good  deci¬ 
sions,  but  they  cannot  handle  all  situations.  It  is  therefore 
desirable  to  incorporate  their  knowledge  into  a  decision¬ 
making  computer  system. 

Experts  describe  their  knowledge  by  statements 
Si . . Sn  (e.g.,  by  if-then  rules).  Experts  are  often  not 
100%  sure  about  these  statements  Sf,  this  uncertainty  is 
described  by  the  subjective  probabilities  pi  (degrees  of  be¬ 
lief,  etc.)  which  experts  assign  to  their  statements.  The 
conclusion  C  of  an  expert  system  normally  depends  on 
several  statements  Si.  For  example,  if  we  can  deduce  C 
either  from  S2  and  S3,  or  from  S4,  then  the  validity  of 
C  is  equivalent  to  the  validity  of  a  Boolean  combination 
(S2  &S3)  V  54.  So,  to  estimate  the  reliability  p{C)  of  the 
conclusion,  we  must  estimate  the  probability  of  Boolean 


combinations.  In  this  paper,  we  consider  the  simplest  pos¬ 
sible  Boolean  combinations  are  Si  &S2  and  Si  V  52- 

In  general,  the  probability  p{Si  &  S2)  of  a  Boolean  com¬ 
bination  can  take  different  values  depending  on  whether  Si 
and  S2  are  independent  or  correlated.  So,  to  get  the  pre¬ 
cise  estimates  of  probabilities  of  all  possible  conclusions, 
we  must  know  not  only  the  probabilities  p{Si)  of  individ¬ 
ual  statements,  but  also  the  probabilities  of  all  possible 
Boolean  combinations.  To  get  all  such  probabilities,  it 
is  sufficient  to  describe  2"  probabilities  of  the  combina¬ 
tions  SI1  &  ...  kS^71,  where  e*  €  {+,—},  S+  means  S, 
and  S~  means  ->S.  The  only  condition  on  these  proba¬ 
bilities  is  that  their  sum  should  add  up  to  1,  so  we  need 
to  describe  2"  —  1  different  values.  A  typical  knowledge 
base  may  contain  hundreds  of  statements;  in  this  case,  the 
value  2"  —  1  is  astronomically  large.  We  cannot  ask  ex¬ 
perts  about  all  2"  such  combinations,  so  in  many  cases, 
we  must  estimate  p(Si  &  S2)  or  p(S  1  V  S2)  based  only  on 
the  values  pi  =  p(Si)  and  P2  =  ^(S^).  There  exist  many 
possible  “and” -operations  /&  :  [0, 1]  x  [0, 1]  ->  [0, 1]  which 
transform  the  degrees  pi  and  P2  into  an  estimate  /&  (pi ,  P2) 
forp(S'i  &  S^).  Similarly,  there  exist  many  “or” -operations 
which  transform  degrees  the  pi  and  P2  into  an  estimate 
/v(pi,P2)  for  p(Si  V  S2). 

Many  such  operations  have  been  successfully  used  in 
fuzzy  logic  and  intelligent  control;  see,  e.g.,  [22],  [56].  In 
spite  of  the  successes,  there  are  still  major  problems  with 
these  operations: 

•  First,  these  operations  are  not  perfect.  Indeed,  some  of 
these  operations,  although  very  natural  and  useful  at  first 
glance,  seem  to  violate  natural  commonsense  requirements; 
we  will  give  an  example  later). 

•  Second,  there  are  so  many  different  possible  “and”-  and 
“or”  -operations  that  it  is  difficult  to  meaningfully  select  one 
of  them.  Any  guidance  for  decreasing  the  class  of  possible 
operations  is  very  welcome. 

B.  Reasoning  and  Logic:  Multiresolutional  Approach 

In  our  viewpoint,  the  above  problems  of  the  existing  log¬ 
ical  methodologies  come,  to  a  large  extent,  from  the  fact 
that  researchers  often  combine  different  degrees  of  certainty 
together.  In  reality,  the  degrees  have  a  clear  multiresolu¬ 
tional  character,  and  if  we  fully  take  this  character  into 
consideration,  we  can  make  a  large  progress  in  solving  the 
above  problems. 

Let  us  explain  why  expert  degrees  of  uncertainty  are  mul¬ 
tiresolutional.  An  expert  rarely  provides  us  with  numbers 
describing  his  or  her  degrees  of  uncertainty.  A  more  nat¬ 
ural  way  for  an  expert  to  describe  his/her  degree  of  belief 
in  a  certain  statement  is  to  use  a  word  from  natural  lan¬ 
guage  such  as  “most  probably”  or  “possibly” ,  and  then  we 
translate  this  word  into  a  number.  There  are  only  few  such 
words,  and  these  words  form  the  lowest-resolution  level  of 
the  uncertainty  description.  On  this  level,  several  differ¬ 
ent  statements  with  slightly  different  degrees  of  uncertainty 
may  be  described  by  the  same  word  and  thus,  lumped  into 
a  single  cluster.  To  avoid  this  lumping,  we  may  ask  an 
expert  to  provide  us  with  a  more  detailed  description  of 


the  expert’s  degree,  e.g.,  by  using  hedged  combinations  of 
words  like  “slightly  less  certain  but  still  reasonably  cer¬ 
tain”  .  The  more  details  we  ask,  the  more  higher-resolution 
description  we  get. 

Another  possibility  to  describe  the  expert’s  degrees  in 
numerical  terms  is  to  ask  the  expert  to  describe  his/her 
degrees  on  a  scale  from,  say,  0  to  10.  We  can  start  with 
a  low- resolution  scale,  e.g.,  with  a  scale  consisting  of  only 
two  values  “yes”  and  “no”  that  corresponds  to  the  use  of 
the  classical  (two-valued)  logic.  As  we  increase  the  num¬ 
ber  of  elements  on  the  scale,  we  get  a  higher-  and  higher- 
resolution  description.  Eventually,  we  get  real  numbers 
describing  uncertainty. 

In  both  cases,  we  get  numbers  as  a  result,  but  these  num¬ 
bers  appear  as  a  result  of  a  multiresolutional  procedure.  It 
is  therefore  natural,  when  resolving  the  above  problems  -  of 
seeming  inconsistency  with  common  sense  and  of  too  many 
options  -  to  consider  not  only  the  resulting  assignments  of 
numbers,  but  also  the  multiresolutional  approximations  to 
these  assignments.  This  consideration  indeed  helps  in  solv¬ 
ing  the  above  problems. 

C.  Multiresolutional  Character  of  Uncertainty  Reasoning 
Resolves  the  Inconsistency  Between  Uncertainty  Oper¬ 
ations  and  Common  Sense 

Let  us  give  one  example  of  such  inconsistency  and  show 
how  the  multiresolutional  character  of  human  reasoning 
can  help  with  this  particular  example.  It  is  known  that 
for  given  pi  =  p(Si)  and  p2  =  p(S2),  possible  values  of 
p(SikS2)  form  an  interval  p  =  [p“,p+],  where  p~  = 
max(pi  +  P2  —  1,0)  and  p+  =  min(pi,p2);  and  possible 
values  of  p(Si  V  S'2)  form  an  interval  p  =  [p_,p+],  where 
p~  =  max(pi,p2)  and  p+  =  min(pi  +p2, 1)  (see,  e.g.,  a  sur¬ 
vey  [48]  and  references  therein) .  So,  in  principle,  we  can  use 
such  interval  estimates  and  get  an  interval  p (C)  of  possible 
values  of  p{C).  Sometimes,  this  idea  leads  to  meaningful 
estimates,  but  often,  it  leads  to  a  useless  p (C)  =  [0, 1]  [47], 
[57].  In  such  situations,  it  is  reasonable,  instead  of  using 
the  entire  interval  p,  to  select  a  point  within  this  interval  as 
a  reasonable  estimate  for  p(S\  &52)  (or,  correspondingly, 
for  p(Si  VS2)). 

Since  the  only  information  we  have,  say,  about  the  un¬ 
known  probability  p(Si  &  52)  is  that  it  belongs  to  the  inter¬ 
val  \p~ ,  p+],  it  is  natural  to  select  a  midpoint  of  this  interval 
as  the  desired  estimate: 

def  1  1 

fa(pi,P2)  =  2  •  max(pi  +P2  -  1,0)  +  -  -min(pi,p2); 

def  1  1 

/v(pi,P2)  =  -  -max^i.ft)  +  -  -min(pi  +  p2,l). 

This  midpoint  selection  is  not  only  natural  from  a  common 
sense  viewpoint;  it  also  has  a  deeper  justification.  Namely, 
in  accordance  of  our  above  discussion,  for  n  =  2  state¬ 
ments  Si  and  S2 ,  to  describe  the  probabilities  of  all  possible 
Boolean  combinations,  we  need  to  describe  22  =  4  probabil¬ 
ities  xi  =  p(Si  &  S2),  X2  =  p(S±  &  -iS2),  X3  =  p(~iSi  &  S2), 
and  £4  =  p(_,Si  &  _1S2);  these  probabilities  should  add  up 


to  1:  Xi  +  X2  +  £3  +  £4  =  1  ■  Thus,  each  probability  distri¬ 
bution  can  be  represented  as  a  point  (£1, . . . ,  £4)  in  a  3-D 
simplex  S  =  {(£1,  £2,  £3,  £4)  |  £«  >  0&£i  +  . . .  +  £4  =  1}. 
We  know  the  values  of  pi  =  p{S\)  =  x\  +  £2  and  p2  = 
p{S2)  =  £1  +  £3,  and  we  are  interested  in  the  values  of 
p{S\  &  S2)  =  £1  and  p(Si  VS2)  =  £1  +£2  +£3.  It  is  natural 
to  assume  that  a  priori,  all  probability  distributions  (i.e. , 
all  points  in  a  simplex  S)  are  “equally  possible”,  i.e.,  that 
there  is  a  uniform  distribution  ( “second-order  probability” ) 
on  this  set  of  probability  distributions.  Then,  as  a  natu¬ 
ral  estimate  for  the  probability  p(Si  &52)  of  S±  &52,  we 
can  take  the  conditional  mathematical  expectation  of  this 
probability  under  the  condition  that  the  values  p{Si)  =  pi 
and  p(52)  =  p2: 

E(p(Si  &  S2)  |  p(Si)  =  Pi  &p(52)  =  p2)  = 

P(x  1  |  £1  +  £2  =  pi  &£i  +  £3  =  p2). 

The  problem  is  that  these  operations  are  non-associative. 
Why  is  this  a  problem?  If  we  are  interested  in  estimat¬ 
ing  the  degree  of  belief  in  a  conjunction  of  three  state¬ 
ments  S\  &  S2  &  S3 ,  then  we  can  either  apply  the  “and” 
operation  to  pi  and  p2  and  get  an  estimate  /&(pi,p2)  for 
the  probability  of  Si  &  52  and  then,  we  apply  the  “and” 
operation  to  this  estimate  and  P3,  and  get  an  estimate 
/&(/&(Pi,P2),P3)  for  the  probability  of  (Si  &  S2)  &  S3.  Al¬ 
ternatively,  we  can  get  start  by  combining  S2  and  S3, 
and  get  an  estimate  /&(pi,  /&(P2,P3))-  Intuitively,  we 
would  expect  these  two  estimates  to  coincide,  but,  e.g., 
(0.4  &  0.6)  &  0.8  =  0.2  &  0.8  =  0.1,  while  0.4  &  (0.6  &  0.8)  = 
0.4&0.5  =  0.2  ^  0.1. 

How  can  we  solve  this  problem?  Since  we  know  that 
the  numerical  values  are  only  an  approximation,  we  can 
analyze  how  non-associative  the  above  operations  can  be. 
If  the  difference  is  below  the  natural  resolution  level,  then, 
from  the  practical  point  of  view,  the  above  operations  are 
as  good  as  associative  ones.  The  following  is  true: 

Theorem  [15],  [38]. 

max  | /&(/&(a,6),c)  -  /&(a,  /&(&,  c))|  = 

a,o,c  y 

mpc|/v(/v(o,6),c)  - /v(o,/v(6,c))|  = 

0,0, c  y 

Each  word  describing  a  degree  of  belief  is  a  “granule” 
covering  the  entire  sub-interval  of  values.  Thus,  non¬ 
associativity  is  negligible  if  the  corresponding  realistic 
“granular”  degree  of  belief  have  granules  of  width  >1/9. 
One  can  fit  no  more  than  9  granules  of  such  width  in  the 
interval  [0,1].  This  may  explain  why  humans  are  most 
comfortable  with  <  9  items  to  choose  from  -  the  famous 
“7  plus  minus  2”  law;  see,  e.g.,  [42],  [43]. 

D.  Multiresolutional  Character  of  Uncertainty  Reasoning 
Helps  to  Drastically  Narrow  Down  the  Class  of  Possible 
Logics 

These  results  cover  both  the  logics  in  which  the  set  of 
different  degrees  is  an  interval  [0,1],  and  more  complex 
logics. 


D.l  [0, 1]-Based  Logics 

For  numerical  operations,  if  we  interpret  the  degree  of 
belief  in  a  statement  S  as  (proportional  to)  the  number 
of  arguments  in  favor  of  S,  then  we  arrive  at  a  natural 
choice  of  “and”-  and  “or”  operations:  /&(o,  b)  =  a  ■  b, 
fv(a,b)  =  a  +  b,  and  fy(a,b)  =  ba.  As  one  of  the  unex¬ 
pected  consequences,  we  get  a  surprising  relation  with  the 
entropy  techniques,  well  known  in  probabilistic  approach 
to  uncertainty  [60]. 

A  similar  conclusion  can  be  made  if  we  require  that  the 
operations  be  consistent  with  their  multiresolutional  struc¬ 
ture:  namely,  for  a  discrete  low-resolution  level,  we  define 
“derivatives”  of  these  operations  as  finite  differences,  and 
then  require  that  the  corresponding  continuous  limit  oper¬ 
ations  have  exactly  the  same  expressions  for  the  derivatives 

[4]- 

The  multiresolutional  character  of  human  reasoning  also 
explains  why  in  logic,  only  unary  and  binary  operations 
are  normally  used:  because  although  in  principle,  there 
exist  ternary  operations  on  [0, 1]  (in  the  limit  case)  which 
cannot  be  represented  as  compositions  of  natural  unary 
and  binary  ones,  but  on  each  resolution  level,  when  we 
have  only  finitely  many  degrees,  every  operation  can  be 
naturally  represented  as  such  a  composition  [51]. 

D.2  More  General  Logics 

The  need  for  more  general  logics  comes  from  the  fact  that 
just  like  experts  are  not  sure  about  the  statement  S,  they 
are  also  not  sure  about  their  own  degrees  of  belief  d(S). 
Thus,  instead  of  a  single  number  d(S),  we  can  consider 
several  possible  numbers  d,  with  degrees  d-2  (d)  describing 
to  what  extent  these  numbers  are  adequate  descriptions 
of  the  original  expert’s  uncertainty.  This  “second-order” 
approach  has  several  successful  applications.  In  principle, 
it  is  possible  to  go  further  and  consider  the  fact  that  the 
degrees  d2(d)  are  also  not  given  precisely,  so  we  seem  to 
need  the  third-,  fourth-order  etc,  approaches.  However,  in 
practice,  such  theoretically  possible  approaches  turned  out 
to  be  not  useful.  This  fact  can  be  explained  if  we  take  the 
multiresolutional  character  of  reasoning  into  consideration: 

•  On  the  one  hand,  every  “first-order”  and  “second-order” 
logic,  in  which  the  set  of  degree  of  belief  is  an  ordered  set, 
can  be  naturally  described  as  a  limit  of  an  interval-related 
multiresolutional  procedure  [27],  [28],  [45],  [76]. 

•  On  the  other  hand,  if  degrees  come  from  words,  then  the 
third  order  is  no  longer  necessary  [30]. 

It  is  natural  to  select  a  continuous  approach  which  best 
reflects  the  multiresolutional  character  of  human  reason¬ 
ing,  i.e.,  in  which  there  is  a  qualitative  difference  between 
different  pairs  of  degrees.  A  natural  way  to  describe  this 
difference  in  continuous  case  is  to  use  the  approach  of  non¬ 
standard  analysis,  with  the  actual  infinitesimal  elements 
(=  lexicographic  ordering).  The  optimal  selection  of  such 
logics  is  described  in  [37],  [54], 

Conclusion 

Interval  mathematics  is  very  helpful  in  the  analysis  of 
multiresolutional  systems. 
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